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ABSTRACT 

Real-time artificial intelligence is gaining increasing 
attention for applications in which conventional soft- 
ware methods are unable to meet technology needs. 
One such application area is the monitoring and anal- 
ysis of complex systems. MARVEL, a distributed 
monitoring and analysis tool with multiple expert sys- 
tems, has been developed and successfully applied to 
the automation of interplanetary spacecraft operations 
at NASA’s Jet Propulsion Laboratory. In this paper, 
we describe MARVEL implementation and verifica- 
tion approaches, the MARVEL architecture, and the 
specific benefits that have been realized by using 
MARVEL in operations. 

1. INTRODUCTION 

MARVEL (Multimission Automation for Real-time 
Verification of spacecraft Engineering Link) is an au- 
tomated system for telemetry monitoring and analysis 
at NASA’s Jet Propulsion Laboratory (JPL). MAR- 
VEL has been actively used for mission operations 
since 1989. It was first deployed for the Voyager 
spacecraft’s encounter with Neptune and has remained 
under incremental development since that time, with 
new deliveries occurring every six to ten months. 
MARVEL combines standard automation techniques 
with embedded knowledge-based systems to simulta- 
neously provide real-time monitoring of spacecraft 
data, real-time analysis of anomaly conditions, and a 
variety of productivity enhancements. The primary 
goal of MARVEL is to combine conventional auto- 
mation and knowledge-based techniques to provide 
improved accuracy and efficiency and reduced need 
for constant availability of human expertise. A second 
goal is to demonstrate the benefit of incorporating 
knowledge-based techniques into complex real-time 
applications. 

The traditional spacecraft operations environment at 
JPL has not relied heavily on automation. This is be- 
cause until fairly recently, software technology was 
insufficient for meeting the complex needs of this 


application. The traditional approach has involved 
large teams of highly-trained specialists and support 
personnel for each spacecraft subsystem and each 
mission. The total operations staff for the two Voy- 
ager spacecraft during peak activity periods (such as 
planetary encounters) has consisted of over 100 
individuals. This traditional approach has been used 
successfully for the Voyager mission, resulting in 
enormous volumes of scientific data from brief fly-by 
encounters. 

Despite the past successes, the increasing number and 
complexity of missions will cause this operations ap- 
proach to become less feasible for two reasons. First, 
the workforce costs for supporting this style of oper- 
ations for multiple simultaneous missions are too 
great to be sustained by current NASA budgets. Sec- 
ondly, with the exception of Voyager, missions will 
be returning significantly higher volumes of engi- 
neering and science data on a more continuous basis 
than in the past. 

MARVEL provides user-interface functions, data ac- 
cess, data manipulation, data display, and data archiv- 
ing within an X-windows/Motif environment. The 
detailed expertise for anomaly analysis is implement- 
ed with embedded knowledge-based systems. In the 
event of anomalies, the appropriate knowledge bases 
provide an analysis and recommendations for correc- 
tive action. MARVEL makes it possible for an ana- 
lyst to effectively handle significantly more demand- 
ing real-time situations than in the past, because it 
automatically performs numerous tasks that previous- 
ly required human effort. As a result of MARVEL, it 
has become possible for individual analysts to be re- 
sponsible for several spacecraft subsystems during 
periods of low and moderate spacecraft activity. This 
is because MARVEL reduces both the level of train- 
ing and the cognitive load that are required to perform 
routine operations. 

MARVEL has demonstrated that automation enhanc- 
es mission operations. Individual spacecraft analysts 
are no longer burdened with routine monitoring, in- 
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formation gathering, or preliminary analysis. Bene- 
fits include less workforce dedicated to routine tasks, 
earlier anomaly detection and diagnosis, leverage of 
scarce and valuable expertise, and reduced impact 
from personnel turnover. As a result, a MARVEL sys- 
tem for the Galileo mission (to Jupiter) is now under 
development. 

2. REAL-TIME PERFORMANCE WITH 
EMBEDDED EXPERT SYSTEMS 

Knowledge-based systems have not yet been suffi- 
ciently demonstrated for complex real-time applica- 
tions because in such applications the amount of 
computation is nondeterministic, even in the presence 
of constant input data rates. This is being recognized 
as a limitation of AI systems, making it difficult to ap- 
ply AI approaches where they might otherwise be 
useful. 

While future approaches may make it possible for in- 
telligent systems to adapt more flexibly and dynami- 
cally to real-time situations [Horvitz 1989], [Hayes-R 
oth 1990], [Schwuttke 1992], it is unlikely that any 
single new method will be able to handle all real-time 
situations. However, judicious use of existing AI 
methods can make it possible to obtain improved per- 
formance, both in current systems and in more dynam- 
ic systems of the future. The following paragraphs 
describe some of the methods used in MARVEL that 
enable knowledge-based techniques to enhance the ca- 
pabilities of a real-time system without causing nega- 
tive impact on performance. An example of knowl- 
edge based analysis in MARVEL is provided else- 
where [Schwuttke 1992b]. 

2.1. Knowledge-based Methods Used Only Where 
Essential 

For certain functions, such as diagnostics and anomaly 
correction, expert systems provide better implementa- 
tional paradigms than more efficient conventional 
approaches. However, expert systems usually employ 
interpreters to perform inferencing on the knowledge 
base rather than compiling the knowledge base into 
native code. This tends to compromise performance 
and can pose difficulties in applications where the 
fastest possible response time is a critical factor in 
meeting real-time constraints [Barachini 1988], [Bahr 
1990], 

MARVEL achieves adequate response time by plac- 
ing as much of the computing burden as possible into 


conventional algorithmic functions written in the C 
language. For example, C processes handle the initial 
tasks of allocating telemetry to a monitoring module 
and detecting anomalies. If a potential anomaly is 
found, the corresponding telemetry is passed to the 
appropriate expert system for verification. If the ex- 
pert system concurs that the telemetry appears anom- 
alous, the subsystem monitor then performs an algo- 
rithmic check to determine if the anomalous teleme- 
try is merely the result of data noise or corruption. 
After these preliminary tests are done and a probabil- 
ity of anomaly occurrence has been established, the 
subsystem monitor invokes knowledge-based pro- 
cessing for diagnosis of the anomaly and for recom- 
mendation of corrective action. This technique con- 
tributes to an overall response time that is sufficient 
for real-time monitoring. 

2.2. Hybrid Reasoning For Improved Perfor- 
mance in Knowledge-Based Methods 

MARVEL augments several types of reasoning with 
conventional software methods. For example, MAR- 
VEL uses hybrid reasoning for detecting data that is 
uncertain, corrupted, or of decaying validity. In the 
MARVEL system there are two mechanisms for de- 
tecting data integrity problems. The first mechanism 
uses algorithmic calculations to check the validity of 
quantities such as telemetry values and data modes so 
that obviously noisy data can be eliminated from fur- 
ther processing. This technique is implemented at the 
level of the data management process and is used to 
monitor simple data types. The second mechanism is 
knowledge-based in nature and is implemented in 
rules. This mechanism employs the method of 
expectation-based data validation [Chandrasekaran 
1984], Data of questionable integrity is verified by 
cross-checking it with other data sources for correla- 
tion and corroboration. If an anomaly is indicated by 
a new incoming telemetry word, one can validate this 
hypothesis by examining known related data to see if 
it has values that one would expect if the hypothesis 
was true. If the related data corroborates the initial 
indication, then the knowledge-based system can con- 
clude that the new data is valid and the anomaly 
hypothesis is confirmed. Conversely, if the related 
data does not appear to be consistent with the new 
data then the anomaly hypothesis is not proven. 
MARVEL’S expert systems have been explicitly de- 
signed so that they do not disregard the new data, 
which may provide the first evidence of a true anom- 
aly that will eventually be confirmed by subsequent 
telemetry. Thus, whenever possible, the conclusions 
of the expert systems are based upon patterns of con- 
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Figure 1. The event structure. The figure on the left depicts the general form of the structure; the figure on the right 
shows a specific instance of an event associated with three different anomalies. 


sistent data rather than on a single piece of data in 
isolation. 

2.3. Temporal Reasoning with Minimal Impact on 
Real-Time Processing 

Real-time systems often need to reason about past 
events and about the order in which they occurred. 
The MARVEL expert systems respond to events 
(symptoms) indicated in the spacecraft telemetry by 
attempting to identify and diagnose specific sub- 
system anomalies that caused an event. In order to do 
this, the expert system may need to know about other 
spacecraft events that have occurred in the past and the 
sequence of their occurrence. This involves temporal 
reasoning, which is implemented in MARVEL using 
dynamically updated structures, as shown in Figure 1. 

The structures contain the name of the event, the name 
of an anomaly that may have caused the event, a Bool- 
ean flag indicating whether the event has occurred and 
is currendy relevant, and an integer specifying the se- 
quence in which the event occurred relative to other 
events. The anomaly identifier is necessary since a 
particular event may have bearing on the diagnosis of 
more than one anomaly (that may or may not have oc- 
curred). Thus, a single event may point to multiple 
structures that are each associated with a different 
anomaly. The Boolean flag is set when the event as- 
sociated with the structure is detected from telemetry. 
When this occurs, the relative time of the event is re- 
corded in the structure. The validity of a Boolean flag 
expires after its corresponding anomaly is resolved, 
causing the flag to be reset so that it cannot contribute 
to the detection or diagnosis of the same anomaly un- 
less the associated event occurs again. 

These structures are intended to have minimal impact 
on performance. Once an event is detected, a structure 


is created for each anomaly whose diagnosis may de- 
pend on that event. Thus the multiple pieces of 
evidence that confirm the occurrence of an event need 
only be evaluated once, regardless of how many 
anomalies may be related to that event. Also, event 
structures are not retained indefinitely. There is a 
time limit beyond which an event structure is consid- 
ered no longer useful for identifying and diagnosing 
new anomalies. After this time limit has expired, a 
structure’s Boolean flag is reset to false regardless of 
whether its associated anomaly has been diagnosed. 
This minimizes the number of event structures that 
are active or relevant at any one time, which in turn 
reduces the number of event structure comparisons 
that must be performed during a rule evaluation cycle. 

2.4. Multiple Knowledge Bases For Improved Fo- 
cus of Attention 

When significant events occur, real-time knowledge- 
based systems must focus their attention and resourc- 
es on relevant parts of the search space in order to 
achieve adequate performance. Many expert system 
environments do not have an efficient method for do- 
ing this. One standard way to enable focus of atten- 
tion is to apply different subsets of the domain rules 
within different contexts. MARVEL accomplishes 
this with separate knowledge bases for each space- 
craft subsystem, and with rule contexts (mini-experts) 
within the individual knowledge bases. 

A top-level data management process identifies in- 
coming telemetry and determines which subsystem 
monitoring module to invoke for anomaly detection. 
When an anomaly is found, the subsystem monitor 
then invokes its corresponding expert system to per- 
form the necessary analysis. This logical partitioning 
of input data among reasoning modules enables more 
rapid traversal of the search space and helps to ensure 
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Figure 2. The MARVEL architecture, shown at the left, consists of knowledge processes, conventional processes, and 
hybrid processes. It can be configured to run on one to four workstations, depending on operations needs. The 
subsystem architecture shown at the right provides a more detailed look at the structure of MARVEL’S hybrid sub- 
sytem processes. 


that conclusions and responses that are not relevant to 
the current analysis are not pursued. This approach 
has also contributed to maintainability of the knowl- 
edge bases, in that several smaller knowledge bases 
are easier to maintain than a single large one. 

3. THE DISTRIBUTED ARCHITECTURE 

There are many reasons for distributed problem solv- 
ing [Bond 1988]. For example, distributed systems 
are often characterized by greater computational speed 
because of concurrent operation. Also, a distributed 
system can be significantly more cost-effective, since 
it can include a number of simple modules of low unit 
cost. Further, distributed systems may offer a more 
natural way to represent certain classes of problems 
which contain inherently partitionable sub-processes. 
Each of these reasons is considered important in the 
mission operations environment, and as a result, a dis- 
tributed MARVEL environment has been 
implemented. 

3.1 Implementation 

The distributed MARVEL architecture shown in Fig- 
ure 2 is based on a central message routing scheme. 
The various software modules are allocated among a 
configuration of UNIX workstations. The data man- 
agement module receives telemetry data from JPL’s 


ground system. Each of the three subsystem monitors 
provide functions such as validation of telemetry, de- 
tection of anomalies, diagnosis of causes, and rec- 
ommendation of corrective actions. The latter two 
functions are provided through intelligent reasoning 
modules that are embedded within each of the indi- 
vidual subsystem monitors. The remaining modules 
include the display processes for each of the three 
subsystem monitors, and the system-level reasoning 
module for diagnosing anomalies that manifest them- 
selves in multiple subsystems (and therefore cannot 
be completely analyzed by any one subsystem alone). 

The interconnectivity of the distributed system is pro- 
vided by a TCP/IP central router program and a set of 
messaging routines that are linked into the subsystem 
processes. All MARVEL processes are connected to 
the central router by UNIX sockets. 

3.2 Performance 

For realistic systems with non-negligible communi- 
cation overhead, the critical measure curve is related 
to the speedup S(N) [Fox 1988] defined as 
S(N) = T seq / T conc (N). 

In this equation, N denotes the number of processors, 
and T^ and T conc (N) refer to the execution times of 
the sequential program and the distributed program 
on N processes, respectively. Distributed systems 
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with a speed-up S(N)= 0.8N are considered to be very 
efficient [Fox 1988]; the minimum desired speedup 
for a distributed MARVEL system was 0.6N. 

The basic measurement of performance for the distrib- 
uted MARVEL is S(N). However, it has not been 
possible to measure a unique value S(N) because of 
the heterogeneous nature of the MARVEL modules. 
This heterogeneity arises because the processing load 
of the four basic components (the data management 
module and the three subsystem modules) are not 
identical. Our alternative to this measurement is the 
lowest speedup of the individual subsystems. With a 
four-processor implementation, a speedup of 3.6, or 
0.9N was observed. This result indicates that MAR- 
VELis a highly efficient distributed system. Two 
factors contribute to the success of these results. The 
first of these is the modularity inherent in the applica- 
tion (as is common in many other complex applica- 
tions). The second factor is a distributed design that 
effectively minimizes the need for interprocess 
communication. 

4. DISCUSSION 

The development of MARVEL has shown the value of 
a rapid development approach that emphasizes top- 
down design and bottom-up implementation. The im- 
plementation has been modular and incremental, with 
frequent deliveries (every 5 to 10 months) of new or 
enhanced capabilities. The result has been an auto- 
mated tool that began as a simple software module for 
automating straightforward tasks and that has evolved 
over a period of five years into a sophisticated system 
for automated monitoring and analysis. The initial 
modular design enabled MARVEL to be developed 
incrementally, with each subsequent delivery provid- 
ing greater breadth to the application. This approach 
has been instrumental to the success of the effort, be- 
cause it was compatible with available budgets and 
encouraged user and sponsor confidence with frequent 
demonstration of results. In addition, the approach has 
influenced the validation and use of MARVEL as de- 
scribed below. 

4.1 Verification of Expert Systems 

MARVEL verification has been ad hoc, largely be- 
cause of a lack of formal procedures. Two methods 
have been used: carefully engineered test cases and 
on-line verification (involving parallel operations with 
human analysts). Most problems were detected with 
the use of test-cases, but some were not detected until 


the software was used in an on-line mode. Newly de- 
livered modules were subject to an on-line verifica- 
tion period, typically on the order of one month. 
On-line verification allowed continual comparison 
with the results of manual approaches so that reason- 
able confidence in the automated system could be 
obtained. 

The primary advantage of this approach has been its 
minimal impact on development costs. However, 
there have also been several disadvantages. On a few 
occasions, minor bugs in MARVEL went undetected 
until the end of the parallel-operations phase. This 
temporarily undermined user-confidence, particularly 
with users who were not enthusiastic about 
automation. A second disadvantage is that without 
formal verification procedures there have been occa- 
sional questions about whether MARVEL should be 
formally accepted as "official"ground software for 
mission operations. The current lack of solid answers 
in this area would prevent the use of MARVEL’S AI 
modules for certain tasks that are considered mission 
critical, but has not prevented the use of these mod- 
ules in an advisory mode. 

4.2 Use and Benefit of MARVEL 

MARVEL has been in active, daily use since it was 
first deployed in 1989. The current version performs 
real-time monitoring for three spacecraft subsystems. 
These functions previously required the presence of 
human analysts for a minimum of eight hours per sub- 
system per day. During planetary encounters, human 
presence was required on a twenty-four hour basis. 
MARVEL also performs non-real-time functions that 
were previously unautomated. These functions save 
from 30 minutes per week (for clock drift analysis) to 
2 hours per day (for daily report generation). During 
MARVEL’S on-line tenure, it has detected all the 
anomalies that occurred within its domain. During 
parallel operations, most anomalies were first detect- 
ed by MARVEL. On two occasions MARVEL de- 
tected anomalies that operations personnel believe 
may have been overlooked by them because the quan- 
tities of data transmitted at those times were larger 
than could reasonably be handled without automated 
assistance. 

Initial emphasis on productivity enhancement result- 
ed in an early version of MARVEL that (according to 
the responsible operations supervisor) would have 
made real-time CCS subsystem workforce reductions 
of 60% (3 out of 5 analysts) possible during the Nep- 
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tune encounter, had MARVEL been approved for 
stand-alone rather than parallel operations. Subse- 
quent to the Neptune encounter, significant workforce 
reductions have been implemented for all spacecraft 
subsystems because of post-encounter budget cuts. 
MARVEL played a substantial role in simplifying the 
transition to reduced workforce for the subsystems for 
which it was available. 

The initial emphasis on productivity enhancement 
temporarily curtailed the development of MARVEL’S 
expert systems, because it was perceived that diagnos- 
tic systems did not improve efficiency of operations. 
This perception stemmed from the observation that 
anomaly analysis was only required in the presence of 
spacecraft anomalies, which did not occur with suffi- 
cient frequency to warrant an automated approach, 
particularly since human confirmation of the expert 
system analysis would still be required. 

However, the post-encounter workforce reductions 
caused renewed interest in expert systems. However, 
the goal is no longer workforce reduction, but the 
preservation of mission expertise. The current ana- 
lysts are new to the mission and, for the most part, do 
not have the experience of the previous staff. The new 
personnel will have fewer opportunities to gain such 
experience: although the Voyager interstellar mission 
is scheduled to continue until approximately 2018, 
spacecraft activity is at a low level. This means that 
there are far fewer opportunities for learning about the 
spacecraft and its operation. There is concern that an- 
alysts with the experience to handle future anomalies 
will be less readily available, or that they will have 
retired. As a result, MARVEL’S expert systems are 
being expanded to provide information that is based 
on the expertise of former analysts. 

6. SUMMARY 

This paper has presented methods for combining con- 
ventional software with AI techniques for use in real- 
time problem solving systems. The methods de- 
scribed have been presented in the context of the 
MARVEL system which has provided a continuous 
and evolving demonstration of the success of the ap- 
proach since Voyager’s Neptune encounter in August 
1989. These techniques have been implemented in a 
distributed environment that will accommodate the 
real-time demands of NASA’s more recently launched 
interplanetary missions. 
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